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Abstract 

A_ state-of-the-art method for automatically segmenting liver tumours using Dynamic Contrast-Enhanced 
Magnetic Resonance Imaging (DCE-MRI) is shown in this study. This study is significant because it uses a 4D 
information deep learning model to tackle the hard problem of liver tumor segmentation. A combination of 3D 
CNNs and ConvLSTM networks, specifically built to capture spatial and temporal information inside the dynamic 
imaging sequence of DCE-MRI, is what the suggested model is all about. Utilizing diffusion-computed 
tomography (DCE-MRI) gives a lot of information vital for precise tumor segmentation by providing a complete 
picture of the vascular dynamics in the liver. The model makes use of spatial and temporal elements by combining 
3D Convolutional Neural Networks (CNNs) with ConvLSTM networks; this allows for a more detailed 
comprehension of the changes that are happening over time. To overcome the difficulties caused by the constantly 
changing nature of DCE-MRI data, this integration of 4D information greatly improves the accuracy and 
consistency of liver tumor segmentation. Implementing and optimizing the suggested deep learning model are the 
main goals of this work. The training and calibration of the model to accurately capture liver tumor 
characteristics in dynamic imaging sequences is of utmost importance. 

Keywords: Liver Tumor, Deep Learning, LSTM, CNN, CT Scan, Liver Segmentation. 


1. Introduction 


One of the leading causes of cancer-related mortality 
is liver cancer. Globally, hepatic cellular carcinoma 
(HCC) ranks third in terms of cancer-related mortality 
and ranks fifth among major malignant tumours [1]. 
It is the most prevalent form of primary liver cancer. 
Successful tumor excision requires early detection 
and treatment of HCC [2]. To better plan liver 
treatments, classify therapeutic responses, classify 
hepatic tumours, and estimate patient survival, it can 
be helpful to accurately segment tumours so volume- 
based quantitative information, including textural 
qualities, can be measured. Manual delineation is still 
used for liver tumor segmentation a lot of the time, 
although it's time-consuming, hard-working, and 
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might vary from operator to operator [3-6]. For liver 
and lesion segmentation, many computer-aided 
approaches have been suggested, all based on 
conventional image processing algorithms [7-9]. 
Problems with automated segmentation arise from the 
fact that tumor form, appearance, and location can 
vary greatly, there are often no discernible borders, 
and contrast agents add noise to the mix [10-12]. 
Border leaking on hazy tumor borders [13-14] is a 
major issue with the previously stated approaches 
since they could only use restricted information, such 
as intensity information. In recent years, medical 
image processing has been considerably made easier 
because to the advent of deep learning. Brain tumor 
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segmentation and prostate cancer detection are two 
examples of effective applications of deep 
Convolutional Neural Networks (CNNs) [15-17]. 
Deep learning has also found use in the imaging of 
healthy livers, the staging of liver fibrosis, the 
classification of liver statuses, the identification of 
hepatic tumours, and the differentiating of liver 
masses [18]. The data-driven algorithms that make up 
deep learning allow for the automatic capture of high- 
level features from photographs and performance 
improvements in these areas [19]. Deep learning has 
also shown to be quite effective in liver tumour 
segmentation, as all of the top algorithms in the 2017 
Liver Tumour Segmentation (LiTS) competition used 
it [20]. The main contribution of the paper: 
e Image denoising using Non adaptive threshold 
e Segmentation using fused U-Net 
e Feature extraction using Histogram of Oriented 
Gradients 
e Classification using 3D Convolutional Neural 
Networks (CNN) with Convolutional Long 
Short-Term Memory 
Following is the outline for the rest of the article. 
Section 2 covers a number of liver tumor diagnostic 
methodologies, written by several writers. In Section 
3, we can see the suggested model. Presented in 
Section 4 are the investigation's conclusions. Results 
and future work are discussed in Section 5. 
1.1. Motivation of the Paper 
By presenting a 4D data deep learning model, 
this study tackles the difficulties of liver tumor 
segmentation in DCE-MRI techniques. A 
combination of 3D CNN and ConvLSTM networks 
allows the model to take advantage of the dynamic 
nature of DCE-MRI while capturing spatial and 
temporal characteristics. Improving accuracy in areas 
where current approaches are inadequate is the 
driving force. In line with the increasing need for 
clinical accuracy and with the goal of improving 
patient outcomes, the study advances accurate tumor 
segmentation in medical imaging by offering a strong 
answer. 
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2. Background Study 

e Bakrania et al. [1] investigate the revolutionary 
function of ML models in the clinical detection 
of primary and metastasized liver tumours. Their 
in-depth research sheds light on how AI might 
change the face of diagnosis and provide new 
ways of thinking about patient care. The research 
highlights the need of incorporating sophisticated 
technologies into clinical practice by deciphering 
the models' impacts; this will allow for more 
precise and faster diagnosis. 

e Esposito et al. [2] provide a ground-breaking 
method for evaluating primary human liver 
cancer cells using Raman __ spectroscopy 
supported by artificial intelligence. Their work 
demonstrates how AI might enhance diagnostic 
capacities by offering deep insights into the 
molecular makeup of malignant cells. New 
possibilities for accurate and non-invasive cancer 
detection were presented by this study, which 
uses Raman spectroscopy in conjunction with 
machine learning. 

e Gavini and Lakshmi [3] put forth a novel long 
short-term memory (LSTM) model that uses 
convolutional neural networks (CNNs) to predict 
the grade of liver tumours in CT scans. An 
essential part of therapy planning was 
appropriately assessing tumours, and their study 
shows that deep learning algorithms work well 
for this task. With the use of associated 
characteristics taken from CT scans, the model 
was able to attain a high level of accuracy, giving 
doctors important data for individualizing patient 
treatment. 

e Hendi et al. [4] explore the use of deep learning 
methods for sub typing and predicting liver 
disorders using an adaptive strategy. The 
significance of personalized medicine techniques 
in enhancing patient outcomes was highlighted 
by their research. Improved methods of illness 
management were a direct result of this study's 
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focus on personalizing deep learning models for 
each patient. 

e Kang et al. [5] concentrate on applying AI to 
forecast the amounts of safe liver resections for 
large keratectomy operations. By improving 
resection planning for the benefit of both patients 
and surgeons, their work tackles an important 
facet of liver surgery. This project aims to 
improve surgical decision-making and patient 
outcomes by incorporating AI algorithms into 
preoperative planning. 

e Khan et al. [6] release a deep neural network that 
can detect liver cancer in several classes using 
multiple input modalities. Their work was 
groundbreaking because it improves diagnosis 
accuracy by combining data from several 
sources. Improved liver cancer diagnosis and 
subsequent treatment choices were made 
possible by the suggested model's incorporation 
of supplementary data from many modalities. 

2.1. Problem Definition 

The precise segmentation of liver tumours using 
Dynamic Contrast-Enhanced Magnetic Resonance 
Imaging (DCE-MRI) is the subject of this study. Due 
to the ever-changing nature of DCE-MRI data, 
current approaches struggle to capture the intricacies 
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of liver tumor characteristics. To traverse these 
obstacles, the research suggests a new 4D data deep 
learning model that merges 3D CNN _ with 
ConvLSTM networks. Using spatial and temporal 
data collected from DCE-MRI dynamic imaging 
sequences, we want to improve tumor segmentation 
accuracy and reliability. Evaluating the suggested 
model's effectiveness in clinical settings and putting 
it into practice are other areas of emphasis in the 
study. 

3. Materials and Methods 
The experimental setup and techniques used in the 
mentioned papers are outlined in the materials and 
methods section. Research methods, including data 
collecting, model construction, and analytic 
approaches, are described in great depth. In order to 
grasp the scientific rigor of the presented results and 
to replicate the experiments, this section is designed 
to be a guide. 

3.1. Dataset Collection 
The dataset was collected from Kaggle web site 
https://www.kaggle.com/datasets/andrewmvd/lits- 
png among male cancers, liver cancer ranks fifth, 
whereas among female cancers, it is ninth. More than 
840,000 new cases were reported in 2018, as shown 
in Figure 1. 
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Figure 1 Overall Architecture 
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3.2. Image Denoising using Non-adaptive 
Threshold 


An easy method for picture denoising is non-adaptive 
thresholding, which involves applying a constant 
threshold value consistently throughout the whole 
image to determine whether pixels are signals or 
noise. In this approach, the noise properties and the 
intended denoising level are used to determine a 
threshold value. Then, pixels are classified as either 
signal or noise depending on how intense they are in 
comparison to the threshold. Denoising methods, 
such median or Gaussian filtering, are then used to 
modify or replace the noise pixels. Despite its 
computing efficiency and ease of implementation, 
non-adaptive thresholding cannot work as well when 
faced with complicated picture structures or situations 
where the properties of the noise fluctuate 
geographically. Our goal in writing this paper was to 
build upon Asymptotically, the time required to locate 
broken objects in a noisy setting can be limited, and a 
maximum number of tests can be determined (Zhao, 
Z. et al. 2023). The amount of encoding tests and the 
time needed for decoding in threshold group testing 
(TGT) with a gap can be decreased by building a 
matrix. To demonstrate this point, we use Algorithm 
1 to the (n, d _, u; z]-disjunction matrix shown in of. 


There must be numbers such that (e — r) (a) 


holds. Figure it out. S is the defective set and z is the 
positive integer; we'll keep things simple. A non- 
adaptive approach can successfully identify set using 
just(x,e —i,r;y] tests in a (x,e—i,r;y| —TGT 
model with a maximum of (x,e — i,7r; y] incorrect 
outcomes, where the decoding difficulty is: 


ose -invixr (C)+e-9 GF )C5)@))—-@ 


3.3. Segmentation using Fused U-Net 
To increase the accuracy of segmentation, Fused U- 
Net makes use of a modified U-Net architecture that 
includes many input modalities or characteristics. U- 
Nets are known for their effective collection of spatial 
information at various scales, to its encoder-decoder 
structure with skip links referred by Weimin, W. et al. 
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(2024). Fused U-Net takes the original picture as 
input and fuses it with other modalities, such as 
complementing feature maps or other imaging 
modalities. The network's capacity to detect various 
picture features is improved by this data fusion, which 
ultimately leads to better and more accurate 
segmentation outcomes. To identify objects or areas 
of interest in input pictures, the network learns during 
training to efficiently combine data from several 
modalities or characteristics. With its ability to fuse 
data from numerous sources, Fused U-Net has shown 
effective in medical picture segmentation tasks. This 
is especially true in difficult situations involving 
structures with unclear borders or complicated 
structures. The Fused U-Net architecture features 
bypass channels connecting the encoder and decoder, 
and a thick convolution block with several 
convolution layers and _ concatenation _ layers 
preceding each convolution layer serves as the 
backbone. It is in the concatenation layer that the 
input and output of the preceding and current 
convolution layers are combined. With the 
information about the activation function convolution 
block (D), the maximum pooling operation (P), and 
the up sampling function (T), we can use Equation (2) 


to getx'*, 


xii -| D(p@r)) j= 0 
Beale FOr) j >0 
The input to Layer 0 comes exclusively from the 
encode layer before it, whereas the input to Layer j > 
0 comes from the j levels before it in the same skip 
route and also from the lower skip pathway's up- 
sampled output T(x't1/-1). Fused U-Net design has 
several benefits over regular U-Net, including 
improved segmentation accuracy and fine-grained 
feature preservation. Objects of varying sizes can be 
handled via the hierarchical Fused U-Net's skip 
connections. 
3.4. Feature Extraction using Histogram of 
Oriented Gradients 
A well-known method in computer vision for 
representing the local gradient information in an 
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image is feature extraction using Histogram of 
Oriented Gradients (HOG). In this method, the picture 
is partitioned into smaller spatial areas called cells, 
and the direction and amplitude of the gradients inside 
each cell are calculated. The distribution of gradient 
directions inside the cell is quantified by constructing 
a histogram of gradient orientations using these 
gradient values. To better capture the spatial 
interactions between cells, it is common practice to 
aggregate nearby cells into bigger blocks. The 
histograms of these blocks are then normalized to 
make them more resistant to variations in light and 
contrast. The resultant HOG feature descriptor is 
well-suited for applications like object identification, 
recognition, and classification as it contains details 
about the image's local edge or texture patterns. 
Situations where an object's form or texture 
discriminates across classes or categories are ideal for 
HOG features. Distributions of local intensity 
gradients or edge orientations can often accurately 
describe the form and look of local objects, even 
when exact information about the gradient or edge 
placements is not known. This statement defines the 
HOG approach, which has an extensive history of 
application in human identification and has been used 
in its mature version in Scale Invariant Features 
Transformation. By integrating gradient directions 
across a tiny spatial region known as a "cell" pixel, 
the HOG descriptor is built upon. Afterwards, a 1D 
histogram is built, and the features vector that will be 
considered later is the product of their combination. 
Let L denote the image to be examined as a grayscale 
(intensity) function. We use the following rule to find 
the x and y gradient directions in each pixel after we 
divide the image into cells of size N pixels: 
_4 L(~%,y+1)-L(x,y-1 
Oxy ee — 
The following is the normalization process for the 
single HOG feature vector h that is created by 


merging the normalized block features: 
h 


Le Tae (4) 
hy <— min (hy, t) ----------- (5) 
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Re mies (6) 


HN represents the nth element of h, whereas is a 
positive cutoff (= 0:2). If large gradients had an 
outsized impact, they would obscure every other 
feature in the image. The number of cell histograms 
relative to blocks in the resulting HOG characteristic 
is four times higher. 

3.5. Classification using 3D Convolutional 
Neural Networks (CNN) with Convolutional 
Long Short-Term Memory 

Classification using 3D CNNs with ConvLSTM 
networks is an effective method for looking at 
spatiotemporal data, especially for applications that 
include volumetric or video data. 3D convolutional 
neural networks (CNNs) are advancement over their 
2D predecessors, specifically built to handle data in 
three dimensions, including medical volumetric 
pictures or video frames. 3D convolutional layers 
make up these networks, and they capture spatial 
connections in the input data by extracting spatial 
properties over the whole volume. The model can 
detect temporal relationships in sequential data by 
using 3D CNNs with ConvLSTM layers. Like regular 
LSTM cells, convLSTM cells have a memory cell and 
gates to regulate the flow of data. Nevertheless, 
ConvLSTM enables the network to  leam 
spatiotemporal patterns directly from the input by 
performing the operations in a convolutional fashion. 
The 3D convolutional layers allow the ConvLSTM 
model to learn how to extract spatial characteristics 
from the input data in each frame or volume during 
training. After that, the ConvLSTM layers get the 
features that were extracted and use them to record 
the dependencies that occurred over time in 
successive frames or volumes. This improves the 
model's classification performance by letting it learn 
and reflect the data's dynamics and mobility more 
effectively. An example of a deep learning model is a 
convolutional neural network, which can be used to 
analyze images. Convolutional layers with filters, 
batch normalization layers, pooling layers, non-linear 
activations, and FC layers are some of the many 
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processing steps that Convolutional Long Short-Term 
Memory convolutional neural networks (CNNs) go 
through while processing images. The complete 
connection weights and convolution filter weights are 
the trainable parameters of a CNN model with 
Convolutional Long Short-Term Memory. The 
proposed convolutional neural network (CNN) model 
for brain tumor classification includes two feed 
forward (FC) layers (fc_l and fc_2) and five 
convolutional (conv_l—conv_5) layers, as shown in 
Table 3.1. The proposed CNN model, which makes 
use of Convolutional Long Short-Term Memory, is 
capable of handling 256x256 images to its 256x256 
input layer. A convolution layer uses a series of 
filters, one after another, to collect various activations 
from the input image. The matrix y(m, 7) that results 
from the linear convolution of a size FxF filter 
k(m,n) with an image x(m, n) is, 
F F 


y(m,n) = YP ed’ ex kG —in—-j)--() 


The input volume with dimensions (X1,Y1,Z1) 
becomes a volume with dimensions (X2, Y2, K) after 
applying K filters. The formulas for X2 and Y2 are as 
follows, 


According to this plan, they are both the one and two 
people involved. Each batch normalization layer is 
followed by an activation layer using ReLU. Each 
ReLU operation is followed by a max-pooling. 
Algorithm 1: CNN with Convolutional Long 
Short-Term Memory 
Input: 

e A 4D tensor’ representing sequential 
volumetric data, where dimensions correspond 
to batch size, depth, height, and width. 

Steps: 
1. Input Preparation: Preprocess input data to 
ensure uniform dimensions and format. 
Normalize input data if necessary. 
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F F 


y(mn) = EF ed? _ex(i {km —in—j) 


2. Model Construction: Define the architecture 
of the 3D CNN with ConvLSTM model, 
specifying the number and size of 
convolutional layers, LSTM layers, and fully 
connected layers. 

3. Model Compilation: Compile the model by 
specifying the optimizer, loss function, and 
evaluation metrics. 

X,-F +2P 


4. Training: Train the model on a labeled dataset 
of sequential volumetric data, adjusting the 
model parameters using back propagation 
and optimization algorithms to minimize the 
loss function. 

Output: 

e Probability distribution over the classes for 

each input sequence. 

4. Results and Discussion 
The study's findings are reported and examined in 
depth in the results and discussion section. The 
purpose of this part is to analyze the results of the 
experiments, assess how well the techniques that were 
suggested worked, and talk about what those results 


mean in relation to the goals of the study. 
Accuracy 


—— training accuracy 
val accuracy 


0.0 25 5.0 75 10.0 1225 150 17.5 
epochs 


Figure 2 Training Accuracy Comparison Chart 
The Figure 2 shows training accuracy comparison 


chart the x axis shows epochs and the y axis shows 
accuracy. 
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Figure 3 Training Loss Figure 5 Grayscale Histogram 
The Figure 3 shows training loss value comparison The Figure 5 shows grayscale histogram chart the x 


axis shows grayscale value and the y axis shows 


chart the x axis shows epochs and the y axis shows 
pixels, and Figure 6 shows the feature values chart. 


training loss and Figure 4 shows the denoisied image. 
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Figure 6 Feature Values Chart 


Figure 4 Denoisied Image 


Performance Comparison of Existing and Proposed Methods 


Table 1 Classification Performance Metrics a 
Comparison me can 


37 |“ Femeasure 


er 
measure 
Existing | SVM 94.32 90.74 91.37 | 90.21 
methods 


Performance (%) 
2 


Proposed | Propo: Ea eal 22 97.32 97.10 | 96.92 » 
methods ¢ $ ¢ é 


Figure 7 Performance Metrics Comparison Chart 
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The suggested technique surpasses current methods in 
every assessment parameter, including accuracy, 
precision, recall, and F-measure, as shown in Table 1 
and Figure 7. In particular, compared to SVM 
(94.32%), LSTM (95.24%), and CNN (96.37%), the 
suggested technique attains a much greater accuracy 
of 98.22%. It also shows less false positives (97.32%) 
and greater accuracy (97.34%) than SVM (90.74%), 
LSTM (92.54%), and CNN (95.00%). Better capacity 
to recognize true positives is shown by the suggested 
method's recall (97.10%), which is much greater than 
that of SVM (91.37%) and LSTM (92.14%). 
Furthermore, the suggested strategy outperforms 
SVM (90.21%), LSTM (93.000%), and CNN 
(95.00%) in terms of F-measure, suggesting a 
balanced performance in terms of recall and accuracy. 
When compared to other methods, these results 
demonstrate how much better the suggested technique 
is at producing trustworthy classifications. 
Conclusion 

Finally, this study presents a cutting-edge method for 
automated liver tumor segmentation that is 
specifically designed for DCE-MRI imaging. 
Specifically for dynamic imaging sequences, the 
research tackles the challenges of liver tumor 
segmentation by using a 4D information deep 
learning model that combines 3D CNN with 
ConvLSTM networks. The suggested model 
incorporates both spatial and temporal data, which 
greatly improves tumor segmentation accuracy, while 
DCE-MRI gives a complete picture of the vascular 
dynamics in the liver. The study's basic goals, which 
revolve on optimizing the model and evaluating its 
performance, highlight its dedication to improving 
tumor segmentation methods. This study adds to the 
body of knowledge in medical imaging by showing 
how the suggested strategy might improve upon or 
perhaps replace current practices. The increasing need 
for accuracy in healthcare settings is well-suited to a 
AD information deep learning model that incorporates 
both spatial and temporal variables. Positive effects 
on patient outcomes can result from the study's novel 
methodology, which shows potential in enhancing 
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diagnostic and therapeutic capacities for liver 
tumours. 
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